Skip to content

[superlog] Downgrade recoverable AI tool errors from ERROR to WARN in job-level wide event#483

Open
superlog-app[bot] wants to merge 1 commit into
stagingfrom
superlog/fix-tool-logger-error-escalation
Open

[superlog] Downgrade recoverable AI tool errors from ERROR to WARN in job-level wide event#483
superlog-app[bot] wants to merge 1 commit into
stagingfrom
superlog/fix-tool-logger-error-escalation

Conversation

@superlog-app

@superlog-app superlog-app Bot commented Jun 16, 2026

Copy link
Copy Markdown

Summary

Insights generation jobs that encounter a recoverable AI tool failure (e.g. a ClickHouse SQL query error, an annotation creation returning NOT_FOUND) emit their final job-completion log at ERROR severity even when job_status: "succeeded". This creates false-positive Superlog incidents on every scheduled batch run.

Root cause

createToolLogger.error() in packages/ai/src/ai/tools/utils/logger.ts calls requestLogger.error(err, …), which sets evlog's internal hasError = true on the job-scoped wide event. When the job runner calls logger.emit({ job_status: "succeeded" }) at the end of a successful run, evlog's emit() fires at ERROR level because hasError is still set — even though the AI agent recovered from the tool failure and the job completed successfully.

Fix

Change the request-scoped path in createToolLogger.error() from requestLogger.error() to requestLogger.warn(). All error context (tool name, SQL snippet, error message) is still merged into the wide event at WARN severity. Genuine job failures continue to log at ERROR — apps/insights/src/jobs.ts calls logger.error(err) directly on the job-failure path, which is unaffected by this change. The log.error(…) fallback (used outside a job scope) is also unchanged.

Alternative approach: move the level-escalation guard into the job runner itself (only emit at ERROR when job_status: "failed"), but this would require changes to jobs.ts for every call site and doesn't fix the same pattern if it appears in other services using createToolLogger.

Incident on Superlog


Was this PR helpful? Leave feedback — goes straight to the Superlog team.


Summary by cubic

Downgraded recoverable tool errors from ERROR to WARN in job-scoped events to prevent false-positive incidents when a job ultimately succeeds. This keeps successful insights runs from emitting a final ERROR log.

  • Bug Fixes
    • In packages/ai/src/ai/tools/utils/logger.ts, updated createToolLogger.error() to use requestLogger.warn(message, …) instead of requestLogger.error(err, …) so recoverable tool failures don’t escalate the wide event; real job failures still log at ERROR via the job runner.

Written for commit 0487ce1. Summary will update on new commits.

Review in cubic

@vercel

vercel Bot commented Jun 16, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
databuddy-status Ready Ready Preview, Comment Jun 16, 2026 9:31am
2 Skipped Deployments
Project Deployment Actions Updated (UTC)
dashboard Skipped Skipped Jun 16, 2026 9:31am
documentation Skipped Skipped Jun 16, 2026 9:31am

@unkey-deploy

unkey-deploy Bot commented Jun 16, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Unkey Deploy

Name Status Preview Inspect Updated (UTC)
api (preview) Ready Visit Preview Inspect Jun 16, 2026 9:31am

@blacksmith-sh

blacksmith-sh Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Found 7 test failures on Blacksmith runners:

Failures

Test View Logs
[chromium] › test/e2e/specs/regressions/
api-keys.spec.ts:4:5 › creates and deletes an API key without leaving confirmation dial
ogs open @regression @core
View Logs
[chromium] › test/e2e/specs/regressions/
website-analytics.spec.ts:31:5 › shows seeded analytics data and applies a topbar filte
r @regression @core
View Logs
[chromium] › test/e2e/specs/regressions/
website-analytics.spec.ts:7:5 › renders analytics controls in the website topbar @regre
ssion @core
View Logs
[chromium] › test/e2e/specs/smoke/
account.spec.ts:3:5 › updates the signed-in user's profile name @smoke
View Logs
[chromium] › test/e2e/specs/smoke/
auth.spec.ts:12:5 › boots an authenticated browser session @smoke
View Logs
[chromium] › test/e2e/specs/smoke/
auth.spec.ts:30:5 › signs out and protects authenticated routes @smoke @core
View Logs
[chromium] › test/e2e/specs/smoke/
auth.spec.ts:3:5 › redirects unauthenticated visitors to sign in @smoke
View Logs

Fix in Cursor

@greptile-apps

greptile-apps Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes false-positive Superlog incidents caused by recoverable AI tool failures (e.g. a failed ClickHouse SQL query or a NOT_FOUND annotation response) escalating the job-level wide event to ERROR even when the job ultimately succeeded. The fix is a one-line change in createToolLogger.error(): the request-scoped call is switched from requestLogger.error() to requestLogger.warn(), preventing evlog from setting hasError = true on the wide event for these recoverable errors.

  • The root cause and fix are accurately diagnosed; the genuine job-failure path in jobs.ts (logger.error(err)logger.emit({ job_status: \"failed\" })) is untouched and continues to emit at ERROR.
  • The change also silently removes new Error(message) (and its stack trace) in favour of passing the raw string to requestLogger.warn().
  • All 30+ logger.error() call sites across tool files are now uniformly downgraded to WARN within a request scope, with no remaining way for a caller to force ERROR level through createToolLogger.

Confidence Score: 4/5

Safe to merge for the stated goal; the job-failure path is untouched and genuine incidents will still fire at ERROR.

The change is small, well-reasoned, and correctly targets the evlog escalation mechanism. The main open question is whether blanket-downgrading every tool logger.error() call to WARN is the right long-term interface contract — currently there is no way for any tool to log at ERROR within a request scope, even for errors that may warrant it. This is a design trade-off worth discussing, not a blocking defect.

The single changed file packages/ai/src/ai/tools/utils/logger.ts and the 30+ tool files that call logger.error() — none of those callers currently have a way to override the WARN downgrade if a future error warrants ERROR-level visibility.

Important Files Changed

Filename Overview
packages/ai/src/ai/tools/utils/logger.ts Downgrades requestLogger.error() to requestLogger.warn() in the request-scoped path of createToolLogger.error(); also removes the new Error(message) wrapper, dropping stack trace capture. The core fix is sound but every logger.error() call site across 30+ tool usages is now silently at WARN with no way to escalate.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Tool as AI Tool (e.g. annotations.ts)
    participant TL as createToolLogger
    participant RL as requestLogger (evlog wide event)
    participant JR as jobs.ts (job runner)

    Note over JR,RL: Job starts — wide event created
    JR->>RL: createInsightsEventLog(context)

    Tool->>TL: logger.error("Failed to create annotation", ctx)

    alt BEFORE this PR
        TL->>RL: "requestLogger.error(new Error(msg), {aiTool, ...ctx})"
        Note over RL: hasError = true — wide event escalated to ERROR
        JR->>RL: "logger.emit({ job_status: "succeeded" })"
        Note over RL: emits at ERROR — false-positive incident
    else AFTER this PR
        TL->>RL: "requestLogger.warn(message, {aiTool, ...ctx})"
        Note over RL: hasError unchanged — wide event stays at INFO/WARN
        JR->>RL: "logger.emit({ job_status: "succeeded" })"
        Note over RL: emits at correct level — no false-positive
    end

    Note over JR,RL: Genuine failure path (unchanged)
    JR->>RL: logger.error(err)
    JR->>RL: "logger.emit({ job_status: "failed" })"
    Note over RL: emits at ERROR — real incident fires correctly
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Tool as AI Tool (e.g. annotations.ts)
    participant TL as createToolLogger
    participant RL as requestLogger (evlog wide event)
    participant JR as jobs.ts (job runner)

    Note over JR,RL: Job starts — wide event created
    JR->>RL: createInsightsEventLog(context)

    Tool->>TL: logger.error("Failed to create annotation", ctx)

    alt BEFORE this PR
        TL->>RL: "requestLogger.error(new Error(msg), {aiTool, ...ctx})"
        Note over RL: hasError = true — wide event escalated to ERROR
        JR->>RL: "logger.emit({ job_status: "succeeded" })"
        Note over RL: emits at ERROR — false-positive incident
    else AFTER this PR
        TL->>RL: "requestLogger.warn(message, {aiTool, ...ctx})"
        Note over RL: hasError unchanged — wide event stays at INFO/WARN
        JR->>RL: "logger.emit({ job_status: "succeeded" })"
        Note over RL: emits at correct level — no false-positive
    end

    Note over JR,RL: Genuine failure path (unchanged)
    JR->>RL: logger.error(err)
    JR->>RL: "logger.emit({ job_status: "failed" })"
    Note over RL: emits at ERROR — real incident fires correctly
Loading

Comments Outside Diff (2)

  1. packages/ai/src/ai/tools/utils/logger.ts, line 21-41 (link)

    P2 error() method permanently downgrades all callers to WARN — no escape hatch for genuine tool-level errors

    Every call site across the tools (annotations, flags, funnels, goals, links, query, rpc — 30+ logger.error() call sites) is now silently logged at WARN in any request-scoped context, with no way for a caller to force ERROR level through createToolLogger. If a future tool encounters an error that is not recoverable but still gets caught-and-returned-as-a-tool-response (so the job itself succeeds), engineers will only see a WARN in the wide event with no alerting surface. The existing warn() method already exists on the returned object; callers that genuinely need WARN could have used it directly, leaving error() at ERROR for the cases that warrant it.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  2. packages/ai/src/ai/tools/utils/logger.ts, line 21-33 (link)

    P2 Stack trace dropped when switching from Error object to plain string

    The old code created new Error(message) and passed it to requestLogger.error(err, …), which lets evlog capture a stack trace pinning the exact tool call site. The new call passes message (a plain string) to requestLogger.warn(), so the stack is no longer attached. At WARN severity this is less critical, but it does make it harder to identify exactly which code path produced a logged tool failure during an investigation.

Reviews (1): Last reviewed commit: "[superlog] Downgrade recoverable AI tool..." | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants